Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Clustering tendency analysis algorithm based on data stream
FAN Zhongxin
Journal of Computer Applications    2020, 40 (8): 2248-2254.   DOI: 10.11772/j.issn.1001-9081.2020010057
Abstract463)      PDF (1853KB)(365)       Save
Focusing on the issues that clustering tendency analysis algorithms based on sampling have instability and one-sidedness in clustering tendecy index, and clustering tendency parameters need to be computed repeatedly because the algorithms do not suit the batch incremental property of data stream, an improved Clustering Tendency Index analysis algorithm based on Minimum Distance Connected Graph (MDCG) was proposed, namely MDCG-CTI, which performs overall analysis on all data. First, MDCG was built with complexity optimization by using stack depth-first traversal to update the nearest path of incremental data; then clustering tendency index was computed to determine the judgment threshold of clustering; finally, the proposed algorithm was integrated with batch incremental Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Experimental results on self-built datasets show that the proposed algorithm has higher accuracy of clusterable determination than existing algorithms for single cluster and data with a large number of noises. And on large datasets pendigits and avila, the proposed algorithm has the time consumption reduced by 38% and 42% compared to Spectral Visual Assessment of cluster Tendency (SpecVAT); meanwhile, the proposed algorithm combined with batch incremental DBSCAN has average accuracy of clustering increased by 6% and 11% and time consumption of clustering reduced by 7% and 8% compared to SpecVAT combined with batch incremental DBSCAN. It can be seen that the proposed algorithm not only determines clustering tendency nonparametrically and accurately, but also improves effectiveness and operational efficiency of incremental clustering.
Reference | Related Articles | Metrics
Improved BIRCH clustering algorithm based on connectivity distance and intensity
FAN Zhongxin, WANG Xing, MIAO Chunsheng
Journal of Computer Applications    2019, 39 (4): 1027-1031.   DOI: 10.11772/j.issn.1001-9081.2018081790
Abstract532)      PDF (778KB)(257)       Save
Focusing on the issues that clustering results of Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) depend on the adding order of data objects, BIRCH has poor clustering effect on non-convex clusters, and each cluster of BIRCH can only contain a similar number of data objects because of the cluster diameter threshold, an improved BIRCH algorithm was proposed. In this algorithm, the cluster diameter threshold was replaced by connectivity distance and intensity threshold which described the connectivity between the data objects, and cluster merging step was added into the generation of cluster feature tree. Experimental result on custom and iris, wine, pendigits datasets show that the proposed algorithm has higher clustering accuracy than the existing improved algorithms such as multi-threshold BIRCH and density-improved BIRCH; especially on large datasets, the proposed algorithm has accuracy increased by 6 percentage points and running time reduced by 61% compared to density-improved BIRCH. The proposed algorithm can be applied to online real-time incremental data processing and identify non-convex clusters and clusters with uneven volume, has denoising function and significantly reduces time-complexity and space-complexity.
Reference | Related Articles | Metrics